{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "5335174c",
   "metadata": {},
   "source": [
    "## 8.2 VGGNet"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02d6109a",
   "metadata": {},
   "source": [
    "论文可见于：[论文链接](https://arxiv.org/abs/1409.1556)\n",
    "\n",
    "VGGNet是Oxford的Visual Geometry Group的团队在ILSVRC 2014上的相关工作。\n",
    "\n",
    "本文作者Karen Simonyan和Andrew Zisserman当时都是牛津大学工程科学系Visual Geometry Group成员，这也是VGG名字的由来。其中一作Karen后曾在DeepMind担任研究员，现在在一家研究AI人机交互的初创公司Inflection AI担任联合创始人兼首席科学家。二作Andrew Zisserman在牛津大学担任计算机视觉工程教授，同时是英国皇家学会教授。\n",
    "\n",
    "从论文摘要中就能看出几个关键点的研究方向，论文主要研究卷积网络深度对模型的影响，使用3x3的小卷积，深度提升到11-19层。并且以此在2014年的ImageNet挑战赛在定位和分类问题中分获第一和第二，并且推广到其他数据集上同样取得了SOTA的结果。\n",
    "\n",
    "同时VGGNet的拓展性很强，迁移到其他图片数据上的泛化性非常好。VGGNet的结构非常简洁，整个网络都使用了小尺寸的卷积核以及相同的池化尺寸（2x2）。到目前为止，VGGNet依然经常被用来提取图像特征，论文引用次数已经超过9万，被广泛应用于视觉领域的各类任务。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5da5510c",
   "metadata": {},
   "source": [
    "### 8.2.1 VGGNet基本思想"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e9e0d8f4",
   "metadata": {},
   "source": [
    "它的主要思想其实是在于解决了以下几个问题：\n",
    "\n",
    "1. 在当时的卷积神经网络模型中，网络深度越深，表现就越好，但同时也会带来较大的计算复杂度和较高的模型大小。VGGNet通过设计一种更深但同时较小的卷积神经网络，解决了这一问题。\n",
    "\n",
    "2. 在训练过程中，卷积神经网络容易出现过拟合的问题，导致泛化能力较差。VGGNet通过设计合理的网络结构和使用较小的卷积核，提高了模型的泛化能力。\n",
    "\n",
    "3. 在计算机视觉任务中，网络的计算效率是一个很重要的问题。VGGNet通过使用小核卷积和简单的网络结构，在保证较高的精度的同时，提高了模型的计算效率。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8f515ef1",
   "metadata": {},
   "source": [
    "### 8.2.2 VGGNet结构"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93381be6",
   "metadata": {},
   "source": [
    "VGGNet结构如下图所示：\n",
    "\n",
    "<img src=\"./images/8-2-1.png\" width=\"100%\"></img>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc40e182",
   "metadata": {},
   "source": [
    "VGGNet是一个典型的卷积神经网络模型，由若干个卷积层和全连接层组成。从上面的结构中我们可以看到，VGGNet包含多个级别的网络，深度从11到19层不等。其网络结构主要可以分为三个部分：\n",
    "\n",
    "1. 卷积层：VGGNet的卷积层卷积核的大小均为3x3，且卷积核的个数相同。\n",
    "\n",
    "2. 池化层：VGGNet在卷积层之后使用了最大池化层来缩小图像的尺寸，并通过池化层的下采样来减少计算复杂度。\n",
    "\n",
    "3. 全连接层：VGGNet的全连接层包括两个4096维的全连接层和一个输出层。输出层的大小取决于任务的类别数。\n",
    "\n",
    "最后经过一个sofmax得到最终类别上的概率分布。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab51347e",
   "metadata": {},
   "source": [
    "以VGG16为例，其网络结构图经可视化后如下所示：\n",
    "\n",
    "<img src=\"./images/8-2-2.png\" width=\"80%\"></img>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d348f5f0",
   "metadata": {},
   "source": [
    "VGG16参数量计算参考下图：\n",
    "\n",
    "<img src=\"./images/8-2-3.png\" width=\"50%\"></img>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6eb95eb",
   "metadata": {},
   "source": [
    "### 8.2.3 为什么选用小卷积核"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "78a03caa",
   "metadata": {},
   "source": [
    "VGGNet中使用的都是3x3的小卷积核，本质上讲，两个3x3卷积层的串联相当于1个5x5的卷积层，如下图所示。而3个3x3的卷积层串联相当于1个7x7的卷积层，即3个3x3卷积层的感受野大小相当于1个7x7的卷积层。但是3个3x3的卷积层参数量只有7x7的一半左右，同时前者可以有3个非线性操作，而后者只有1个非线性操作。这使得前者对于特征的学习能力更强，同时参数更少。\n",
    "\n",
    "<img src=\"./images/8-2-4.png\" width=\"50%\"></img>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5348f6d",
   "metadata": {},
   "source": [
    "### 8.2.4 VGGNet代码实现"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8cd91154",
   "metadata": {},
   "source": [
    "接下来看一下如何用代码实现相关网络结构。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "341a636b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 导入必要的库\n",
    "import torch\n",
    "import torch.nn as nn"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23b6e770",
   "metadata": {},
   "source": [
    "首先定义VGGNet的网络结构。其中features是卷积部分提取特征的网络结构，这个下面会展开讲。classifier是最后全连接层的部分，生成分类部分的网络结构。forward函数定义前向传播过程，描述各层间的连接关系。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "05481332",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 定义VGGNet的网络结构\n",
    "class VGG(nn.Module):\n",
    "    def __init__(self, features, num_classes=1000):\n",
    "        super(VGG, self).__init__()\n",
    "        self.features = features\n",
    "        self.classifier = nn.Sequential(\n",
    "            nn.Linear(512 * 7 * 7, 4096),\n",
    "            nn.ReLU(inplace=True),\n",
    "            nn.Dropout(),\n",
    "            nn.Linear(4096, 4096),\n",
    "            nn.ReLU(inplace=True),\n",
    "            nn.Dropout(),\n",
    "            nn.Linear(4096, num_classes),\n",
    "        )\n",
    "\n",
    "    def forward(self, x):\n",
    "        x = self.features(x)\n",
    "        x = torch.flatten(x, 1)\n",
    "        x = self.classifier(x)\n",
    "        return x"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa812ad3",
   "metadata": {},
   "source": [
    "定义相关配置项。其中每个key都代表一个模型的配置文件，数字代表卷积层中卷积核的个数，'M'表示池化层。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "30357203",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 定义相关配置项，其中M表示池化层\n",
    "cfgs = {\n",
    "    'vgg11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],\n",
    "    'vgg13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],\n",
    "    'vgg16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],\n",
    "    'vgg19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "95821ca9",
   "metadata": {},
   "source": [
    "定义生成卷积部分网络结构的函数。遍历传入的配置列表，拼接出对应的网络结构。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "44d2692e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 根据配置项拼接卷积层\n",
    "def make_layers(cfg):\n",
    "    layers = []\n",
    "    in_channels = 3\n",
    "    for v in cfg:\n",
    "        if v == 'M':\n",
    "            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]\n",
    "        else:\n",
    "            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)\n",
    "            layers += [conv2d, nn.ReLU(inplace=True)]\n",
    "            in_channels = v\n",
    "    return nn.Sequential(*layers)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3672aaf1",
   "metadata": {},
   "source": [
    "封装对应模型的函数，直接调用即可。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "6a9fc781",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 封装函数\n",
    "def vgg11():\n",
    "    return VGG(make_layers(cfgs['vgg11']))\n",
    "\n",
    "def vgg13():\n",
    "    return VGG(make_layers(cfgs['vgg13']))\n",
    "\n",
    "def vgg16():\n",
    "    return VGG(make_layers(cfgs['vgg16']))\n",
    "\n",
    "def vgg19():\n",
    "    return VGG(make_layers(cfgs['vgg19']))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "81ed20cc",
   "metadata": {},
   "source": [
    "### 8.2.5 小结"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "68a05d12",
   "metadata": {},
   "source": [
    "* VGGNet的结构非常简洁，整个网络都使用了同样大小的卷积核尺寸和池化层。\n",
    "* 几个小滤波器（3x3）卷积层的组合比一个大滤波器（5x5或7x7）卷积层好，一方面减少参数，另一方面拥有更多的非线性变换。\n",
    "* 验证了通过加深网络结构可以提升性能。\n",
    "* VGGNet不使用局部响应归一化（LRN），这种标准化并不能在ILSVRC数据集上提升性能，却导致更多的内存消耗和计算时间。\n",
    "* VGGNet最后全连接层的参数量很大，可能会导致训练缓慢。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
